It's not the voting that's democracy, it's the counting: 
Statistical detection of systematic election irregularities 
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Democratic societies are built around the principle of free and fair elections, that each citizen's 
vote should count equal. National elections can be regarded as large-scale social experiments, where 
people are grouped into usually large numbers of electoral districts and vote according to their 
preferences. The large number of samples implies certain statistical consequences for the polling 
results which can be used to identify election irregularities. Using a suitable data collapse, we find 
that vote distributions of elections with alleged fraud show a kurtosis of hundred times more than 
normal elections. As an example we show that reported irregularities in the 2011 Duma election are 
indeed well explained by systematic ballot stuffing and develop a parametric model quantifying to 
which extent fraudulent mechanisms are present. We show that if specific statistical properties are 
present in an election, the results do not represent the will of the people. We formulate a parametric 
test detecting these statistical properties in election results. For demonstration the model is also 
applied to election outcomes of several other countries. 



Free and fair elections are the cornerstone of every 
democratic society [1]. A central characteristic of elec- 
tions being free and fair is that each citizen's vote counts 
equal. However, already Joseph Stalin believed that 
"The people who cast the votes decide nothing. The 
people who count them decide everything." How can it 
be distinguished whether an election outcome represents 
the will of the people or the will of the counters? 

Elections are fascinating, large scale social experi- 
ments. A country is segmented into a usually large 
number of electoral districts. Each district represents 
a standardized experiment where each citizen articulates 
his/her political preference via a ballot. Despite differ- 
ences in e.g. income levels, religions, ethnicities, etc. 
across the populations in these districts, outcomes of 
these experiments have been shown to follow certain uni- 
versal statistical laws Huge deviations from these 
expected distributions have been reported for the votes 
for United Russia, the winning party in the 2011 Duma 
election [7|, [8| . 

In general, using an appropriate re-scaling of elec- 
tion data, the distributions of votes and turnout are ap- 
proximately a Gaussian [5j. Let Wi be the number of 
votes for the winning party and Ni the number of vot- 
ers in electoral district z, then the logarithmic vote rate 
is Vi = log Wi vv Ni • In figure [2] we show the distribution 
of vi over all electoral districts. To first order the data 
from different countries collapse to a Gaussian. Clearly 
the data for Russia and Uganda boldly fall out out of 
line. Skewness and kurtosis are listed for each data-set 
in table S HU confirming these observations quantitatively. 
Most strikingly, the kurtosis of the distributions for Rus- 
sia (2003, 2007 and 2011) and Uganda deviate by two 
orders of magnitude from each other country. The only 
reasonable conclusion from this is that the voting results 



in Russia and Uganda are driven by other mechanisms 
or processes than other countries. 

However, such distributions only reveal part of the 
story, and a different representation of the data becomes 
helpful to gain a deeper understanding. Figure [T] shows 
a 2-d histogram of the number of electoral districts for a 
given fraction of voter turnout (x-axis) and for the per- 
centage of votes for the winning party (y-axis). Results 
are shown for recent parliamentary elections in Austria, 
Finland, Russia, Spain, Switzerland, and the UK, and 
presidential elections in the USA and Uganda. Data was 
obtained from official election homepages of the respec- 
tive countries, for more details and more election results, 
see SOM. These figures can be interpreted as fingerprints 
of several processes and mechanisms leading to the over- 
all election results. For Russia and Uganda the shape of 
these fingerprints are immediately seen to differ from the 
other countries. In particular there is a large number of 
districts (thousands) with a 100% percent turnout and 
at the same time a 100 % of votes for the winning party. 

The shape of these irregularities can be understood 
with the assumption of the presence of the fraudulent 
action of ballot stuffing. This means that bundles of 
ballots with votes for one party are stuffed into the 
urns. Videos purportedly documenting these practices 
are openly available on online platforms @-ll|. I n one 
case the urn is already filled with ballots before the elec- 
tions start, e.g. [9[, in other cases members of the elec- 
tion commission are caught filling out ballots, e.g. [To| . 
Yet in another case the pens in the polling stations are 



shown to be erasable, e.g. 11]. Are these incidents non- 



representative exceptions or the rule? 

We develop a parametric model to quantify the extent 
of ballot stuffing for a given party to explain the elec- 
tion fingerprints in figure [U The distributions for Russia 
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FIG. 1. Election fingerprints: 2-d histograms of the num- 
ber of electoral districts for a given voter turnout (x-axis) 
and the percentage of votes (y-axis) for the winning party (or 
candidate) in recent elections from eight different countries 
(from left to right, top to bottom: Austria, Finland, Russia, 
Spain, Switzerland, Uganda, UK and USA) are shown. Color 
represents the number of electoral districts. Districts usually 
cluster around a given turnout and voting level. In Uganda 
and Russia these clusters are 'smeared out' to the upper right 
region of the plots, reaching a second peak at a 100% turnout 
and a 100% of votes (red circles). In Finland the main cluster 
is smeared out into two directions (indicative of voter mo- 
bilization due to controversies surrounding the True Finns). 
In the UK the fingerprint shows two clusters stemming from 
rural and urban areas (see SOM). 



and Uganda are clearly bimodal. One at intermediate 
levels of turnout and votes, smeared towards the upper 
right parts of the plot. The second peak is situated at 
the vicinity of the 100% turnout, 100% votes point. This 
suggests two modes of fraud mechanisms, incremental 
and extreme fraud. Incremental fraud means that with a 
given rate ballots for one party are added to the urn and 
votes for other parties are replaced. This occurs within 
a fraction fa of electoral districts. In the election finger- 
prints in figure [1] these districts are shifted to the upper 
right. Extreme fraud corresponds to reporting nearly all 
votes for a single party with an almost complete voter 
turnout. This happens in a fraction f e of districts, which 
form a second cluster near 100% turnout and votes for 
the incumbent party. 

For simplicity in the model we assume that within each 
electoral district turnout and voter preferences follow a 
Gaussian distribution with the mean and standard devi- 
ation taken from the actual sample, see figure 32l With 
probability fa (f e ) the incremental (extreme) fraud mech- 
anisms are then applied. Note that if more detailed as- 
sumptions are made about possible mechanisms leading 
to large-scale heterogeneities in the data such as city- 
country differences in turnout (UK) or coast-non-coast 
(USA) (see SOM), this will have an effect on the esti- 
mate of fa. Figure [3] compares the observed and mod- 
eled fingerprint plots for the winning parties in Russia, 
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FIG. 2. A simple way to compare data from different elec- 
tions in different countries is to present the distributions of 
the logarithmic vote rates Vi of the winning parties as rescaled 
distributions with zero-mean and unit- variance |o| . Large de- 
viations from other countries can be seen for Uganda and 
Russia with the plain eye. 



Uganda and Switzerland. Model results are shown for 
fa = fe = Q (fair elections) and for best fits to the data 
(see SOM) for fa and f e . To describe the smearing from 
the main peak to the upper right corner, an incremental 
fraud probability around fa = 0.64 is needed for the case 
of United Russia. This means fraud in about 64% of the 
districts. In the second peak around the 100% turnout 
scenario there are roughly 3,000 districts with a 100% 
of votes for United Russia representing an electorate of 
more than two million people. Best fits yield f e = 0.05, 
i.e. five percent of all electoral districts experience ex- 
treme fraud. A more detailed comparison of the model 
performance for the Russian parliamentary elections of 
2003, 2007 and 2011 is found in the figure $3 The fraud 
parameters for the Uganda data in figure [3] are fa = 0.45 
and f e = 0.01. 

The dimension of election irregularities can be visual- 
ized with the cumulative number of votes as a function 
of the turnout, figure [U For each turnout level the to- 
tal number of votes from districts with this, or lower 
turnouts are shown. Each curve corresponds to the re- 
spective election winner in a different country. Normally 
these cdfs level off and form a plateau from the party's 
maximal vote count on. Again this is not the case for 
Russia and Uganda. Both show a boost phase of in- 
creased extreme fraud toward the right end of the distri- 
bution (red circles). Russia never even shows a tendency 
to form a plateau. 

It is imperative to emphasize that the shape of the fin- 
gerprints in figure [T] will deviate from pure 2-d Gaussian 
distributions due to non- fraudulent mechanisms, such as 
heterogeneities in the population or voter mobilization, 
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FIG. 3. Comparison of observed and modeled 2-d histograms 
for (top to bottom) Russia, Uganda and Switzerland. The 
left column shows the actual election fingerprints, the middle 
column shows a fit with the fraud model. The column to 
the right shows the expected model outcome of fair elections 
(i.e. absence of fraudulent mechanisms fa = f e = 0). For 
Switzerland the fair and fitted model are almost the same. 
The results for Russia and Uganda can only be explained by 
the model assuming a large number of fraudulent districts. 



see SOM. However, these can under no circumstances ex- 
plain the mode of extreme fraud. A bad forgery is the 
ultimate insultQ 

It can be said with almost certainty that an election 
does not represent the will of the people, if a substantial 
fraction (f e ) of districts reports a 100% turnout with al- 
most all votes for a single party, and/or if any significant 
deviations from the sigmoid form in the cumulative dis- 
tribution of votes versus turnout are observed. Another 
indicator of systematic fraudulent or irregular voting be- 
havior is a kurtosis of the logarithmic vote rate distribu- 
tion of the order of several hundreds. 

Should such signals be detected it is tempting to in- 
voke G.B. Shaw who held that " [democracy is a form of 
government that substitutes election by the incompetent 
many for appointment by the corrupt few." 
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FIG. 4. The ballot stuffing mechanism can be visualized by 
considering the cumulative number of votes as a function of 
turnout. Each country's election winner is represented by a 
curve which typically takes the shape of a sigmoid function 
reaching a plateau. In contrast to the other countries, Russia 
and Uganda do not tend to develop this plateau but instead 
show a pronounced increase (boost) close to complete turnout. 
Both irregularities are indicative of the two ballot stuffing 
modes being present. 
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SUPPORTING ONLINE MATERIAL 
The data 

Descriptive statistics and official sources of 
the election results are shown in table ^TJ The 
raw data will be made available for download at 
http : //www. complex- syst ems .meduniwie nTac . at/| 
Each data set reports election results of parliamentary 
(Austria, Finland, Russia, Spain, Switzerland and UK) 
or presidential (Uganda, USA) elections on district 
level. In the rare circumstances where electoral districts 
report more valid ballots than registered voters, we work 
with a turnout of 100%. With the exception of the US 
data, each country reports the number of registered 
voters and valid ballots for each party and district. For 
the US there is no exact data on the voting eligible 
population on district level, which was estimated to be 
the same as the population above 18 years, available 
at |http : //census . gov| Fingerprints for the 2000 US 
presidential elections are shown in figure 3D fc> r both 
candidates for districts from the entire USA and Florida 
only. There are no irregularities discernible. 



Model 

A country is separated into n electoral districts z, each 
having an electorate of N{ people and in total Vi valid 
votes. The fraction of valid votes for the winning party 
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FIG. 1. Turnout against percentage of votes for Bush (left col- 
umn) and Gore (right) in the 2000 US presidential elections. 
Results are shown for all districts in the USA (top row) and 
for districts from Florida (bottom). There are no traces of 
fraudulent mechanisms discernible in the fingerprints. 



in district i is denoted vi. The average turnout over all 
districts, a, is given by a = l/n'52 i (Vi/Ni) with stan- 
dard deviation s a , the mean fraction of votes v for the 
winning party is v = 1/n J2 i V{ with standard deviation 
s v . The mean values a and v are typically close to but 
not identical to the values which maximize the empirical 
distribution function of turnout and votes over all dis- 
tricts. Let v be the number of votes where the empirical 
distribution function assumes its (first local) maximum 
(rounded to entire percents), see figure Similarly a 
is the turnout where the empirical distribution function 
of turnouts a\ takes its (first local) maximum. The dis- 
tributions for turnout and votes are extremely skewed to 
the right for Uganda and Russia which also inflates the 
standard deviations in these countries, see table ^TJ To 
account for this a 'left-sided' ('right-sided') mean devia- 
tion a% ( cr ?) from v is introduced, can be regarded 
as the incremental fraud width, a measurable parameter 
quantifying how intense the vote stuffing is. This con- 
tributes to the 'smearing out' of the main peaks in the 
election fingerprints, see figure [T] in the main text. The 
larger <j^, the more inflated the vote results due to urn 
stuffing, in contrast to a% which quantifies the scatter 
of the voters' actual preferences. They can be estimated 
from the data by 



V(( V i - v ) 2 )v x <v , 



(1) 

(2) 



Similarly the extreme fraud width a x can be estimated, 
i.e. the width of the peak around 100% votes. We found 
that a x = 0.075 describes all encountered vote distribu- 




Votes v. 



FIG. 2. A stylized version of an empirical vote distribution 
function shows how v, , and a x are derived from the 
election results, v is the maximum of the distribution func- 



R 



tion. a v measures the distribution width of values to the left 
of v, i.e. smaller than v. The incremental fraud with a 
measures the distribution width of values to the right of v, 
i.e. larger than v. The extreme fraud width a x is the width 
of the peak at 100% votes. 
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TABLE I. Descriptive statistics of the election result datasets. Each row in the table corresponds to one election of the given 
type in the respective country. The number of electoral districts n, mean turnout a and votes for winning party v are shown 
together with estimates for fraud parameters fi and / e , as well as the sources where the data can be downloaded (as of 2011). 
Country Election n a v fi fe Source 



Austria parliament (2008) 2542 0.74 ± 0.06 0.26 ± 0.11 0.04 0.00 |http : //www . bmi . gv . at/ cms/BMI_wah len/nationalrat/start 



Finland parliament (2011) 2350 0.70 ± 0.09 0.17 ±0.10 0.00 0.00 |http : //pxweb2 . stat . f i/database/StatFin/vaa/evaa/Edusk 



Russia parliament (2003) 95181 0.62 ± 0.17 0.40 ± 0.17 0.31 0.01 http://cikrf.ru/ 

Russia parliament (2007) 96192 0.70 ± 0.17 0.68 ± 0.14 0.30 0.04 http://cikrf.ru/ 

Russia parliament (2011) 95057 0.64 ±0.18 0.50 ± 0.30 0.64 0.05 http://cikrf.ru/! 

Spain parliament (2008) 8112 0.78 ± 0.08 0.40 ± 0.14 0.00 0.00 |http : //www . inf oelectoral . mir . eg/ 1 

Switzerland parliament (2007) 2705 0.50 ± 0.09 0.31 ± 0.15 0.04 0.00 |http : //www . portal- stat . admin . ch/nrw/f iles/de/01b2 . xml 

Uganda president (2011) 23968 0.59 ± 0.15 0.68 ± 0.20 0.45 0.01 |http : //www . ec . or . ug/eresult s . html] 

UK parliament (2010) 650 0.65 ± 0.06 0.35 ± 0.16 0.05 0.00 |http : //www . electoralcommission . org . uk/elections/resul 

USA president (2000) 3115 0.57 ±0.15 0.40 ± 0.12 0.03 0.00 I http : //national at las . gov/atlasf tp . html#eldistp] 

USA president (2008) 3117 0.59 ± 0.09 0.42 ± 0.14 0.01 0.00 http://nationalatlas.gOv/atlasftp.html#eldistp 



TABLE II. Skewness and kurtosis of rescaled zero-mean, unit- 
variance distributions of Vi shown in figure and for the 
remaining datasets listed in table £U Russia and Uganda fall 
strongly out of line, with deviations in skewness of about one 
order of magnitude and deviations in kurtosis of mostly two 
orders of magnitude, compared to each other country. 



Country skewness kurtosis 
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tions reasonably well. For a visualization of a^, and 
<j x see figure 32l 

While fi and f e measure in how many districts incre- 
mental and extreme fraud occur, and a x quantify how 
intense these activities are, if they occur. To get an es- 
timate for the width of the distribution of turnouts over 
district which is free of possible fraudulent influences, the 
turnout distribution width a a is calculated from electoral 
districts i which have both Vi < v and ai < a, that is 

Incremental fraud is a combination of two processes: 
stuffing ballots for one party into the urn and re-casting 
or deliberately wrong-counting ballots from other parties 
(e.g. erasing the cross). Which one of these two processes 



dominates is quantified by the deliberate wrong counting 
parameter a > 0. For < a < 1 the wrong-counting 
process dominates, for a > 1 the urn stuffing mechanism 
is prevalent. In the following A/"(/i, a) denotes a normal 
distributed random variable with mean \i and standard 
deviation a. The model is specified by the following pro- 
tocol, which is applied to each district. 

• Pick a district i with electorate Ni taken from the 
data. 

• The model turnout of district z, a\ m \ is Af(a,a a ). 

• A fraction of G Af(v,a^) people vote for the 
winning party. 

• With probability fi incremental fraud takes 
place. In this case the district is assigned a 
fraud intensity Xi G |A/"(0, V^?)!* Values for 
Xi are only accepted if they lie in the range 
< Xi < 1. This is the fraction of votes not cast, 
(1 - a\ m) )N u which are added to the winning 
party. Votes for the opposition are wrong counted 
for the winning party with a rate xf (where a 
is an exponent). To summarize, if incremental 
fraud takes place the winning party receives 

Ni («< m M m) + - a< m >) + xf(l - v^)a!r } ) 
votes. 

• With probability f e extreme fraud takes place. In 
this case opposition votes are canceled and added 
to the winning party with probability jji G 1 — 
|A/"(1, 0-^)1 (i.e. the above with yi replacing xi). 
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FIG. 3. Comparison of results from the 2003, 2007 and 2011 
Russian parliamentary elections and the fraud model. In the 
left column the distributions of the number of districts with 
a given percentage of votes for United Russia is shown for 
data (blue) and fraud model (red) . The middle column shows 
the observed turnout-against-votes distributions. The data 
from 2007 and 2011 shows the same pattern, although the 
main cluster for United Russia is at a higher percentage of 
votes. For 2003 there is a smaller number of districts with 
100% turnout and votes, and the main cluster is spread out 
less. The right column shows fits for the data with the fraud 
model, using parameters fi = 0.31, f e = 0.01 (2003), fi — 
0.30, f e = 0.04 (2007) and /< = 0.64, f e = 0.05 (2011). 



Acceptable values for yi are again from the range 
< yi < 1. 



Fitting the model 

The parameters for incremental and extreme fraud, fi 
and / e , as well as the deliberate wrong counting parame- 
ter a, are estimated by a goodness-of-fit test. Let pdf(vi) 
be the empirical distribution function of votes for the 
winning party (in percent) over all districts. The distri- 
bution function for the model districts pdf^ 171 ^) is cal- 
culated for each set of (fi,f e ,a) values where /i,/ e G 
{0, 0.01, 0.02, ... 1}, a G {0, 0.1 .. . 5}. We report values 
for the fraud parameters where the statistic 



assumes its minimum, see table U for fi and f e . 

The extreme fraud parameter f e is zero for all elections 
except Russia (2003, 2007 and 2011) and Uganda. These 
are also the only elections where the incremental fraud 
parameter fi is not close to zero, say fi > 0.1. It is 
interesting to note that a assumes the same value for 
all Russian elections, Q!R USS i a = 2, different from Uganda 
where au ga nda = 0.3. Results for a from countries where 
fi is close to zero can not be detected in a robust way and 
are superfluous, since there are (almost) no deviations 
from the fair election case. 

Special care is needed in the interpretation of fi and f e 
values in countries where election districts contain sev- 
eral polling stations. It may be the case that extreme 
fraud takes only in a subset of the polling stations within 
a district place. In that case extreme fraud would be in- 
distinguishable from the incremental fraud mechanism. 



S(f u f e ,a)=J2 



Wfo)-p#(*i ro) ) 

Pdf(vi) 



(3) 



On alternative explanations for election 
irregularities 



It is hard to construct other plausible mechanisms 
leading to a large number of polling districts having 
100% turnout and votes for a single party than urn 
stuffing. The case is not so clear for the 'smeared out' 
main cluster. In some cases, namely UK and Finland 
this cluster also takes on a slightly different form. This 
effect clearly does not inflate the turnout as much as it 
is the case in Russia and Uganda, but it is nevertheless 
present. In the UK there are well known large-scale 
heterogeneities between urban and rural areas (see e.g. 
news .bbc . co . uk/2/hi/uk_news/pol itics/election] ) 
in both turnout and voter preferences. Generally 
speaking, urban areas show a smaller turnout and a 
lower preference for the winning Conservatives. Another 
possible mechanism is successful voter mobilization. 
This may lead to a correlation between turnout and 
a party's votes. The Finland elections, for example, 
where marked by radical campaigns by the True Finns. 
They managed to mobilize evenly spread out across 
the country, with the exception of the Helsinki region, 
where the winning National Coalition Party performed 
significantly better than in the rest of the country. 



